## [1] "fixed.acidity" "volatile.acidity" "citric.acid"
## [4] "residual.sugar" "chlorides" "free.sulfur.dioxide"
## [7] "total.sulfur.dioxide" "density" "pH"
## [10] "sulphates" "alcohol" "quality"
## 'data.frame': 1599 obs. of 12 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity: num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.SO2 : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.SO2 : num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## fixed.acidity volatile.acidity citric.acid sugar
## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900
## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900
## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200
## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539
## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600
## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500
## chlorides free.SO2 total.SO2 density
## Min. :0.01200 Min. : 1.00 Min. : 6.00 Min. :0.9901
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00 1st Qu.:0.9956
## Median :0.07900 Median :14.00 Median : 38.00 Median :0.9968
## Mean :0.08747 Mean :15.87 Mean : 46.47 Mean :0.9967
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00 3rd Qu.:0.9978
## Max. :0.61100 Max. :72.00 Max. :289.00 Max. :1.0037
## pH sulphates alcohol quality
## Min. :2.740 Min. :0.3300 Min. : 8.40 Min. :3.000
## 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50 1st Qu.:5.000
## Median :3.310 Median :0.6200 Median :10.20 Median :6.000
## Mean :3.311 Mean :0.6581 Mean :10.42 Mean :5.636
## 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10 3rd Qu.:6.000
## Max. :4.010 Max. :2.0000 Max. :14.90 Max. :8.000
Most wine’s quality is 6 and range is 3 to 8. The mean of alcohol is 10.42.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.60 7.10 7.90 8.32 9.20 15.90
The volatile.acidity distribution is normal. The median is 7.9.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3900 0.5200 0.5278 0.6400 1.5800
The volatile.acidity distribution is bimodal with the volatile.acidity peaking at 0.4, 0.5 and 0.6.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.090 0.260 0.271 0.420 1.000
## feature
## 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14
## 132 33 50 30 29 20 24 22 33 30 35 15 27 18 21
## 0.15 0.16 0.17 0.18 0.19 0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
## 19 9 16 22 21 25 33 27 25 51 27 38 20 19 21
## 0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4 0.41 0.42 0.43 0.44
## 30 30 32 25 24 13 20 19 14 28 29 16 29 15 23
## 0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59
## 22 19 18 23 68 20 13 17 14 13 12 8 9 9 8
## 0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.74
## 9 2 1 10 9 7 14 2 11 4 2 1 1 3 4
## 0.75 0.76 0.78 0.79 1
## 1 3 1 1 1
The distribution for citric acid appears bimodal with the peaking at 0, 0.24, 0.49.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
## feature
## 0.9 1.2 1.3 1.4 1.5 1.6 1.65 1.7 1.75 1.8 1.9 2 2.05 2.1 2.15
## 2 8 5 35 30 58 2 76 2 129 117 156 2 128 2
## 2.2 2.25 2.3 2.35 2.4 2.5 2.55 2.6 2.65 2.7 2.8 2.85 2.9 2.95 3
## 131 1 109 1 86 84 1 79 1 39 49 1 24 1 25
## 3.1 3.2 3.3 3.4 3.45 3.5 3.6 3.65 3.7 3.75 3.8 3.9 4 4.1 4.2
## 7 15 11 15 1 2 8 1 4 1 8 6 11 6 5
## 4.25 4.3 4.4 4.5 4.6 4.65 4.7 4.8 5 5.1 5.15 5.2 5.4 5.5 5.6
## 1 8 4 4 6 2 1 3 1 5 1 3 1 8 6
## 5.7 5.8 5.9 6 6.1 6.2 6.3 6.4 6.55 6.6 6.7 7 7.2 7.3 7.5
## 1 4 3 4 4 3 2 3 2 2 2 1 1 1 1
## 7.8 7.9 8.1 8.3 8.6 8.8 8.9 9 10.7 11 12.9 13.4 13.8 13.9 15.4
## 2 3 2 3 1 2 1 1 1 2 1 1 2 1 2
## 15.5
## 1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
## feature
## 0.9 1.2 1.3 1.4 1.5 1.6 1.65 1.7 1.75 1.8 1.9 2 2.05 2.1 2.15
## 2 8 5 35 30 58 2 76 2 129 117 156 2 128 2
## 2.2 2.25 2.3 2.35 2.4 2.5 2.55 2.6 2.65 2.7 2.8 2.85 2.9 2.95 3
## 131 1 109 1 86 84 1 79 1 39 49 1 24 1 25
## 3.1 3.2 3.3 3.4 3.45 3.5 3.6 3.65 3.7 3.75 3.8 3.9 4 4.1 4.2
## 7 15 11 15 1 2 8 1 4 1 8 6 11 6 5
## 4.25 4.3 4.4 4.5 4.6 4.65 4.7 4.8 5 5.1 5.15 5.2 5.4 5.5 5.6
## 1 8 4 4 6 2 1 3 1 5 1 3 1 8 6
## 5.7 5.8 5.9 6 6.1 6.2 6.3 6.4 6.55 6.6 6.7 7 7.2 7.3 7.5
## 1 4 3 4 4 3 2 3 2 2 2 1 1 1 1
## 7.8 7.9 8.1 8.3 8.6 8.8 8.9 9 10.7 11 12.9 13.4 13.8 13.9 15.4
## 2 3 2 3 1 2 1 1 1 2 1 1 2 1 2
## 15.5
## 1
## 90%
## 3.6
Transform the long tail data to better understand the distribution of sugar The distribution for sugar appears to be right skewed. Most of them (90%) sugar less than 3.6 (4.5 g / cm^3 are considered sweet).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100
## 95%
## 0.1261
Transform the long tail data to better understand the distribution of chlorides The distribution for chlorides appears to be right skewed. Most of them (95%) chlorides less than 0.1261 .
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 7.00 14.00 15.87 21.00 72.00
## 95%
## 35
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 7.00 14.00 15.87 21.00 72.00
## 95%
## 35
Transform the long tail data to better understand the distribution of free.SO2 The free.SO2 distribution is bimodal with the free.SO2 peaking at 7 and 17.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 22.00 38.00 46.47 62.00 289.00
## 95%
## 112.1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 22.00 38.00 46.47 62.00 289.00
## 95%
## 112.1
Transform the long tail data to better understand the distribution of total.SO2 The total.SO2 distribution is normal.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.740 3.210 3.310 3.311 3.400 4.010
## feature
## 2.74 2.86 2.87 2.88 2.89 2.9 2.92 2.93 2.94 2.95 2.98 2.99 3 3.01 3.02
## 1 1 1 2 4 1 4 3 4 1 5 2 6 5 8
## 3.03 3.04 3.05 3.06 3.07 3.08 3.09 3.1 3.11 3.12 3.13 3.14 3.15 3.16 3.17
## 6 10 8 10 11 11 11 19 9 20 13 21 34 36 27
## 3.18 3.19 3.2 3.21 3.22 3.23 3.24 3.25 3.26 3.27 3.28 3.29 3.3 3.31 3.32
## 30 25 39 36 39 32 29 26 53 35 42 46 57 39 45
## 3.33 3.34 3.35 3.36 3.37 3.38 3.39 3.4 3.41 3.42 3.43 3.44 3.45 3.46 3.47
## 37 43 39 56 37 48 48 37 34 33 17 29 20 22 21
## 3.48 3.49 3.5 3.51 3.52 3.53 3.54 3.55 3.56 3.57 3.58 3.59 3.6 3.61 3.62
## 19 10 14 15 18 17 16 8 11 10 10 8 7 8 4
## 3.63 3.66 3.67 3.68 3.69 3.7 3.71 3.72 3.74 3.75 3.78 3.85 3.9 4.01
## 3 4 3 5 4 1 4 3 1 1 2 1 2 2
The pH distribution is normal.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9901 0.9956 0.9968 0.9967 0.9978 1.0040
The distribution for density acid appears to be normal and the different between min and max is only 0.014. ( different between alcohol and water is 0.22)
Ref : https://en.wikipedia.org/wiki/Ethanol
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3300 0.5500 0.6200 0.6581 0.7300 2.0000
## 95%
## 0.93
Transform the long tail data to better understand the distribution of sulphates. The distribution for sulphates appears to be normal. Most of them (95%) sulphates less than 0.93.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
The distribution for alcohol appears to be right skewed.
## feature
## 3 4 5 6 7 8
## 10 53 681 638 199 18
## [1] 0.9493433
Most of data’s wine qulity is between 5 to 7 (94.9 %). I think I will covert this feature to factor for Multivariate Analysis.
ANS : There are 1599 wine in the data set with 12 features.
## 'data.frame': 1599 obs. of 12 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity: num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.SO2 : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.SO2 : num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## fixed.acidity volatile.acidity citric.acid sugar
## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900
## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900
## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200
## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539
## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600
## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500
## chlorides free.SO2 total.SO2 density
## Min. :0.01200 Min. : 1.00 Min. : 6.00 Min. :0.9901
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00 1st Qu.:0.9956
## Median :0.07900 Median :14.00 Median : 38.00 Median :0.9968
## Mean :0.08747 Mean :15.87 Mean : 46.47 Mean :0.9967
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00 3rd Qu.:0.9978
## Max. :0.61100 Max. :72.00 Max. :289.00 Max. :1.0037
## pH sulphates alcohol quality
## Min. :2.740 Min. :0.3300 Min. : 8.40 Min. :3.000
## 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50 1st Qu.:5.000
## Median :3.310 Median :0.6200 Median :10.20 Median :6.000
## Mean :3.311 Mean :0.6581 Mean :10.42 Mean :5.636
## 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10 3rd Qu.:6.000
## Max. :4.010 Max. :2.0000 Max. :14.90 Max. :8.000
## fixed.acidity volatile.acidity citric.acid sugar chlorides free.SO2
## 1 7.4 0.70 0.00 1.9 0.076 11
## 2 7.8 0.88 0.00 2.6 0.098 25
## 3 7.8 0.76 0.04 2.3 0.092 15
## 4 11.2 0.28 0.56 1.9 0.075 17
## 5 7.4 0.70 0.00 1.9 0.076 11
## 6 7.4 0.66 0.00 1.8 0.075 13
## total.SO2 density pH sulphates alcohol quality
## 1 34 0.9978 3.51 0.56 9.4 5
## 2 67 0.9968 3.20 0.68 9.8 5
## 3 54 0.9970 3.26 0.65 9.8 5
## 4 60 0.9980 3.16 0.58 9.8 6
## 5 34 0.9978 3.51 0.56 9.4 5
## 6 40 0.9978 3.51 0.56 9.4 5
Input variables (based on physicochemical tests): 1. - fixed acidity (tartaric acid - g / dm^3): most acids involved with wine or fixed or nonvolatile (do not evaporate readily) 2. - volatile acidity (acetic acid - g / dm^3): the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste 3. - citric acid (g / dm^3): found in small quantities, citric acid can add ‘freshness’ and flavor to wines 4. - residual sugar (g / dm^3): the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet 5. - chlorides (sodium chloride - g / dm^3): the amount of salt in the wine 6. - free sulfur dioxide (mg / dm^3): the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine 7. - total sulfur dioxide (mg / dm^3): amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine 8. - density (g / cm^3) 9. - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale 10. - sulphates (potassium sulphate - g / dm3): a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant 11. - alcohol (% by volume): the percent alcohol content of the wine
Output variable (based on sensory data): 12. - quality (score between 0 and 10)
ANS: The main feature of interest is wine’s quality. I would like to investigate which variable(s) effect the wine quality.
investigation into your feature(s) of interest? ANS: I think smell taste touch and addictive content that will effect the wine’s quality so the features that I choose for investigation is :
## fixed.acidity volatile.acidity citric.acid sugar chlorides free.SO2
## 1 7.4 0.70 0.00 1.9 0.076 11
## 2 7.8 0.88 0.00 2.6 0.098 25
## 3 7.8 0.76 0.04 2.3 0.092 15
## 4 11.2 0.28 0.56 1.9 0.075 17
## 5 7.4 0.70 0.00 1.9 0.076 11
## 6 7.4 0.66 0.00 1.8 0.075 13
## total.SO2 density pH sulphates alcohol quality sourness
## 1 34 0.9978 3.51 0.56 9.4 5 5.1800
## 2 67 0.9968 3.20 0.68 9.8 5 5.4600
## 3 54 0.9970 3.26 0.65 9.8 5 5.4784
## 4 60 0.9980 3.16 0.58 9.8 6 8.0976
## 5 34 0.9978 3.51 0.56 9.4 5 5.1800
## 6 40 0.9978 3.51 0.56 9.4 5 5.1800
Yes, I create “sourness” from fixed.acidity and citric.acid that represent the sourness of wine.
you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?
ANS: The distribution for citric acid, volatile.acidity and free.SO2 appears bimodal and I tidies the data by remove X feature that I am not interested and transform fixed.acidity and citric.acid to sourness for next investigation.
Top correlation values for quality is : 1. alcohol : 0.476 2. volatile.acidity : -0.391 3. sulphates : 0.251 4. citric acid : 0.226
##
## Pearson's product-moment correlation
##
## data: feature and quality
## t = -16.954, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4313210 -0.3482032
## sample estimates:
## cor
## -0.3905578
##
## Pearson's product-moment correlation
##
## data: feature and quality
## t = 9.2875, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1793415 0.2723711
## sample estimates:
## cor
## 0.2263725
##
## Pearson's product-moment correlation
##
## data: feature and quality
## t = 10.38, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2049011 0.2967610
## sample estimates:
## cor
## 0.2513971
##
## Pearson's product-moment correlation
##
## data: feature and quality
## t = 21.639, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4373540 0.5132081
## sample estimates:
## cor
## 0.4761663
ANS: From the plots and correlation values sulphates, citric acid acidity, alcohol positively relate with quality but volatile acidity negatively relate with quality.
Alcohol sulphates and volatile acidity ’s plot show the different between 3 wine rating of wine very well but citric acid show the different between normal and good wine poorly.
(not the main feature(s) of interest)?
##
## Pearson's product-moment correlation
##
## data: featureX and featureY
## t = -26.489, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5856550 -0.5174902
## sample estimates:
## cor
## -0.5524957
##
## Pearson's product-moment correlation
##
## data: featureX and featureY
## t = 13.159, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2678558 0.3563278
## sample estimates:
## cor
## 0.31277
##
## Pearson's product-moment correlation
##
## data: featureX and featureY
## t = 4.4188, df = 1597, p-value = 1.059e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.06121189 0.15807276
## sample estimates:
## cor
## 0.1099032
ANS: I found that citric acid and volatile acidity very correlate.
citric acid and volatile acidity : -0.5524957
citric acid and sulphates acidity : 0.31277
citric acid and alcohol acidity : 0.1099032
ANS: For feature of interest alcohol percentage has highest corelation value. (0.476)
For every pair of features free.SO2 and total.SO2 has highest corelation value. (0.66
First I need to prepare alcohol.level for multivariate plot.
From the plot show that the excellent wine mostly stay on the top left, good wine stay in the middle and normal wine stay in the bottom right.
Wine rating.vs.alcohol.level.vs.volatile.acidity plot shows that :
excellent wine ratio in alcohol grade “medium” on volatile.acidity range 0.25-0.4 is very high.
Wine rating.vs.alcohol.level.vs.citric.acid plot shows that the excellent wine ratio in alcohol grade “medium” on citric.acid at 0 and 0.4 is very high.
Wine rating.vs.alcohol.level.vs.total.SO2 plot shows that the excellent wine ratio in alcohol grade “medium” on total.SO2 at 5-30 is very high.
Wine rating.vs.alcohol.level.vs.sulphates plot shows that the excellent wine ratio in alcohol grade “medium” on sulphates range 0.7-0.9 is very high.
Pattern is not noticable here.
investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?
From the plots , show that alcohol feature is the highest impact feature.
Wine rating.vs.alcohol.level.vs.volatile.acidity plot shows that excellent wine ratio in alcohol grade “medium” on volatile.acidity range 0.25-0.4 is very high.
Wine rating.vs.alcohol.level.vs.citric.acid plot shows that excellent wine ratio in alcohol grade “medium” on citric.acid at 0 and 0.35-0.5 is very high.
Wine rating.vs.alcohol.level.vs.total.SO2 plot shows that excellent wine ratio in alcohol grade “medium” on total.SO2 at 5-30 is very high.
Wine rating.vs.alcohol.level.vs.sulphates plot shows that excellent wine ratio in alcohol grade “medium” on sulphates range 0.7-0.9 is very high.
Win rating.vs.alcohol.level.vs.sourness.vs.chlorides shows that there hardly to determine wine quality by tongue (chorides and sourness).
It is very surprise that smell(total.SO2) has influnce over the wine rating but taste(chorides and sourness) has not.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
The distribution for alcohol appears right skewed with the median at 10.2 % and 90% of red wines data have alcohol between 9.2% to 12.5%, perhaps due to the demand of red wines and buyers purshasing make the plot look like this.
Alcohol percentage, sulphates, citric acid correlate with wine rating positively but volatile acidity correlate negatively.
Red wines’s rating is corelate with these feature order following:
(strong)————————————–>(weak)
alcohol > volatile.acidity > sulphates > citric acid
## normal good excellent
## 242 308 89
## normal good excellent
## 476 192 12
## normal good excellent
## 26 138 116
From plot 3.1 shows that redwines that have low volatile acidity and high % alocohol tend to have higher rating. From plot 3.2.1-3.2.3 the excellent quality redwines have highest median potassium sulphase and the proportion of excellent redwines are greater in medium alcohol level compare to the proportion of redwines in other alcohol level.
## 'data.frame': 1599 obs. of 15 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity: num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.SO2 : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.SO2 : num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## $ sourness : num 5.18 5.46 5.48 8.1 5.18 ...
## $ wine_rating : Ord.factor w/ 3 levels "normal"<"good"<..: 1 1 1 2 1 1 1 3 3 1 ...
## $ alcohol.level : Ord.factor w/ 3 levels "low alcohol"<..: 1 1 1 1 1 1 1 2 1 2 ...
The data set contain 1599 wine from 2009. I start by understand the distribution and variables in data set and try to interpret in term of sense that human can percieve. First I found that the distribution of alcohol is right skewed,I believe that the demands of rewine drive this distribution. From bivariate and multivariate analytsis, I found that the taste sourness and salty has no evidence that they has influence over the quality of wine but the smell (total sulfur dioxide), addictive content (alcohol), voilatile acidity, citric acid and sulphates has influence over it. On low alcohol percentage we hardly found excellent wine_rating but mostly is normal and you can find some of good wine rating if they has total SO2 in range 5-60 and sulphates in range 0.53-0.73 ,On medium-low alcohol percentage wine exellent can be found on low volatile acidity and total sulfur dioxide below 55 but mostly are normal and good rating wine,On medium alcohol percentage exellent red wine can be found at high percentage on total sulfur dioxide below 50 and sulphate upper than 0.65 and mostly are exellent and good rating wine. I struggled to visulize multivariate plot to clearly present the relation more than one featue against wine quality at first finally I found out that if I create new varible that represent the feature as group It will be easier,Next I can not clearly present the relation of selected features by geom_point as you can see non of features has strong corelation value but this become much better when I decide to use histogram.
After I research(google) and ask a drinker,my friend, I found out that there are many significant variables that we do not have such as type of grape, where it made from, age of wine. I am confident that if the data had these variables, I could provide more insightful analysis over red wine quality.